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In association tests of sites with low minor 
allele frequency or count, it is known that 
single-variant tests are impractical to use 
because the results from which will be 
either underpowered or unreliable. Joint 
analyses by pooling or "collapsing" mul- 
tiple variants based on annotated gene 
group information are thus more pre- 
ferred in rare variant association tests. 
However, the issue remains in a genome- 
wide association scan because there is 
always a portion of regions containing 
less number of variant sites. Moreover, 
most current exome or genome sequenc- 
ing association studies are still limited to 
small sample sizes. Standard testing meth- 
ods that rely on the asymptotic theories 
will also not preserve the type I error 
rate. These factors together will distort the 
final genome-wide quantile-quantile plot 
of the testing p-values. A penalized like- 
lihood based method called Firth logistic 
regression method may provide a sim- 
ple yet effective solution. It is easier to 
implement and less computational inten- 
sive than alternative approaches such as 
permutation or bootstrapping, and worthy 
of more attention in association studies of 
sequencing data. 

The basic idea of the firth logis- 
tic regression is to introduce a more 
effective score function by adding an 
term that counteracts the first-order term 
from the asymptotic expansion of the bias 
of the maximum likelihood estimation — 
and the term will goes to zero as the 
sample size increases (Firth, 1993; Heinze 
and Schemper, 2002). For generalized lin- 
ear models with canonical links such as 
in logistic regression, Firth's approach 
is equivalent to penalizing the likeli- 
hood by the Jeffreys invariant prior. The 
attraction of this method is that it provides 



bias-reduction for small sample size as 
well as yields finite and consistent esti- 
mates even in case of separation. In a 
binary response model, separation issue 
occurs when one variant is associated with 
only one type of outcome, e.g., when all 
individuals who carry a particular vari- 
ant (although rare) are diagnosed with the 
disease. The phenomenon is more com- 
monly seen in rare variants studies, espe- 
cially when a recessive model is assumed. 
These variants are undoubtedly important 
but will not be detected by standard sta- 
tistical packages as they often report large 
p- values (and exceptionally larger stan- 
dard errors) — sometimes even without a 
warning message. Although approaches 
like Fisher's exact test and exact logis- 
tic regression can be used to handle 
the separation problem, their use become 
problematic when there are continuous 
covariates need to be considered. The 
implementation of firth logistic regres- 
sion is fairly easy as it is now available 
in many standard packages (such as R 
package "logistf"). In a recent work, Ma 
et al. (2013) performed simulations to 
compare different methods for the rare 
variant association test over varied designs 
and gave promising results. They showed 
that the firth-regression-based joint anal- 
ysis of the individual-level data controls 
type I error well for both balanced and 
unbalanced studies, and which is more 
powerful than score test based meta- 
analysis. 

However, methods and software are 
yet to be developed to handle anal- 
yses with family or related samples. 
Two options are available to handle 
familial correlations. One is to incorpo- 
rate Firth correction into the structure 
of conditional logistic regression (CLR) 



(Heinze and Puhr, 2010). The other 
possibility (may be easier) is based on 
generalized estimation equations (GEE). 
A simple approximation can be readily 
applied in practice by modifying stan- 
dard GEE through the following two 
steps. First, get the leverage values (diag- 
onal of hat-matrix) from a GEE analysis 
with independence working correlation; 
Then add half a leverage to each response 
before rerunning GEE based on a chosen 
working correlation matrix. Such proce- 
dure will not completely remove the first- 
order term of the bias, but will adjust 
toward that direction. This approxima- 
tion will guarantee finite estimates when 
separation occurs. Further investigation 
is, however, needed to test the robust- 
ness of the suggested methods to fac- 
tors such as ascertainment and pedigree 
structures. 
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